Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization
نویسندگان
چکیده
We compare the plug-in rule approach for optimizing the Fβ-measure in multi-label classification with an approach based on structured loss minimization, such as the structured support vector machine (SSVM). Whereas the former derives an optimal prediction from a probabilistic model in a separate inference step, the latter seeks to optimize the Fβ-measure directly during the training phase. We introduce a novel plug-in rule algorithm that estimates all parameters required for a Bayes-optimal prediction via a set of multinomial regression models, and we compare this algorithm with SSVMs in terms of computational complexity and statistical consistency. As a main theoretical result, we show that our plug-in rule algorithm is consistent, whereas the SSVM approaches are not. Finally, we present results of a large experimental study showing the benefits of the introduced algorithm.
منابع مشابه
Exploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملScalable Optimization of Multivariate Performance Measures in Multi-instance Multi-label Learning
The problem of multi-instance multi-label learning (MIML) requires a bag of instances to be assigned a set of labels most relevant to the bag as a whole. The problem finds numerous applications in machine learning, computer vision, and natural language processing settings where only partial or distant supervision is available. We present a novel method for optimizing multivariate performance me...
متن کاملScalable Optimization of Multivariate Performance Measures in Multi-instance Multi-label Learning
The problem of multi-instance multi-label learning (MIML) requires a bag of instances to be assigned a set of labels most relevant to the bag as a whole. The problem finds numerous applications in machine learning, computer vision, and natural language processing settings where only partial or distant supervision is available. We present a novel method for optimizing multivariate performance me...
متن کاملOn the bayes-optimality of F-measure maximizers
The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspectiv...
متن کاملAn Exact Algorithm for F-Measure Maximization
The F-measure, originally introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure remains a statistically and computationally challenging problem, since no closed-form maximizer exists. Current algorithms are approximate and typically ...
متن کامل